Error Detection for Treebank Validation

نویسندگان

  • Bharat Ram Ambati
  • Rahul Agarwal
  • Mridul Gupta
  • Samar Husain
  • Dipti Misra Sharma
چکیده

This paper describes an error detection mechanism which helps in validation of dependency treebank annotation. Consistency in treebank annotation is a must for making data as errorfree as possible and for assuring the usefulness of treebank. This work is aimed at ensuring this consistency and to make the task of validation cost effective by detecting major errors induced during completely manual annotation. We evaluated our system on the Hindi dependency treebank which is currently under development. We could detect 76.63% of errors at dependency level. Results show that our system performs well even when the train-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High Recall Error Identification Tool for Hindi Treebank Validation

This paper describes the development of a hybrid tool for a semi-automated process for validation of treebank annotation at various levels. The tool is developed for error detection at the part-of-speech, chunk and dependency levels of a Hindi treebank, currently under development. The tool aims to identify as many errors as possible at these levels to achieve consistency in the task of annotat...

متن کامل

Linguistic Issues in Language Technology LiLT

Treebanks are a linguistic resource: a large database where the morphological, syntactic and lexical information for each sentence has been explicitly marked. The critical requirements of treebanks for various NLP activities (research and application) are well known. This also implies that treebanks need to be as error free as possible. However, manual validation of a treebank is very costly, b...

متن کامل

Automatic Error Detection in Annotated Corpora

Annotated corpus is a linguistic resource which explicitly encodes the information at syntactic and semantic levels for each sentence. Annotated corpora play a crucial role in many applications of natural language processing (NLP). Error free and consistent annotated corpora is vital for these applications. Creating annotated corpora is an expensive and time consuming process. Errors or anomali...

متن کامل

An Automatic Approach to Treebank Error Detection Using a Dependency Parser

Treebanks play an important role in the development of various natural language processing tools. Amongst other things, they provide crucial language-specific patterns that are exploited by various machine learning techniques. Quality control in any treebanking project is therefore extremely important. Manual validation of the treebank is one of the steps that is generally necessary to ensure g...

متن کامل

تصحیح خودکار خطا در درخت بانک نحوی با استفاده از یادگیری ماشینی انتقال محور

The Treebank is one of the most useful resources for supervised or semi-supervised learning in many NLP tasks such as speech recognition, spoken language systems, parsing and machine translation. Treebank can be developded in different ways that could be, generally, categorized in manually and statistical approaches. While the resulted Treebank in each of these methods has the annotation error,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011